A Framework for Populating Ontological Models from Semi-structured Web Documents
نویسندگان
چکیده
TheWeb is the largest repository of information that has ever existed. This information is presented in a human friendly format using HTML, which complicates the consumption of this information by automatic processes. Solutions to this problem are the Semantic Web and Web Services, but the lack of such services in the majority of web sites has increased the interest on information extraction which allow extracting and structuring information from web documents in ontological models. Despite the high number of proposals on information extraction, there does not exist a universally applicable information extractor. As a consequence, when populating an ontology model automatically from a web site, it is not unusual to need more than one information extractor. We propose a framework that allows the development, training, and the application of information extractors on semistructured web documents to produce semantic data. We have developed a version of the framework and verified it by means of experiments on 35 web sites. Experimental results are very promising.
منابع مشابه
Populating Ontologies with Data from OCRed Lists
A flexible, accurate, and efficient method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selectio...
متن کاملA Case Study on Linked Data Generation and Consumption
The availability of large amounts of interlinked semantic data is a fundamental prerequisite of the Semantic Web. At present, almost all the usable ontological data is built manually or by directly transforming certain (semi-)structured data sources into certain formats of semantic data. To solve the “isolated data island” problem of the Semantic Web caused by this situation, the Linking Open D...
متن کاملToward Ontology-based Knowledge Extraction from Web Data with the Lexicalization of Ontology for Korean QA System
Most of knowledge is written in natural language and structured knowledge base includes the partially limited information of them. In QA system perspective, the quality of knowledge base is depends on how it covers the knowledge to answer user’s questions. To deal with this knowledge base construction problem, we define the natural language question sets and answer documents which contains know...
متن کاملInformation extraction and imprecise query answering from web documents
Word based searches for relevant information from texts retrieve a huge collection and burden the user with information overload. Ontology based text information retrieval can perform concept-based search and extract only relevant portions of text containing concepts that are present in the query or those that are semantically linked to query concepts. While these systems have better precision ...
متن کاملSearching web data: An entity retrieval and high-performance indexing model
More and more (semi) structured information is becoming available on the Web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with ...
متن کامل